Search CORE

24 research outputs found

Interactive Synthesis of Temporal Specifications from Examples and Natural Language

Author: Darulova E.
Gavran I.
Majumdar R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

The robot routing problem for collecting aggregate stochastic rewards

Author: Dimitrova R.
Gavran I.
Majumdar R.
Prabhu V.S.
Soudjani S.E.Z.
Publication venue: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Publication date: 01/01/2017
Field of study

We propose a new model for formalizing reward collection problems on graphs with dynamically generated rewards which may appear and disappear based on a stochastic model. The robot routing problem is modeled as a graph whose nodes are stochastic processes generating potential rewards over discrete time. The rewards are generated according to the stochastic process, but at each step, an existing reward disappears with a given probability. The edges in the graph encode the (unit-distance) paths between the rewards' locations. On visiting a node, the robot collects the accumulated reward at the node at that time, but traveling between the nodes takes time. The optimization question asks to compute an optimal (or epsilon-optimal) path that maximizes the expected collected rewards. We consider the finite and infinite-horizon robot routing problems. For finite-horizon, the goal is to maximize the total expected reward, while for infinite horizon we consider limit-average objectives. We study the computational and strategy complexity of these problems, establish NP-lower bounds and show that optimal strategies require memory in general. We also provide an algorithm for computing epsilon-optimal infinite paths for arbitrary epsilon > 0

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

Dagstuhl Research Online Publication Server

White Rose Research Online

MPG.PuRe

Leicester Research Archive

Antlab: {A} Multi-Robot Task Server

Author: Gavran I.
Majumdar R.
Saha I.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

MPG.PuRe

Reinforcement Learning with Stochastic Reward Machines

Author: Corazza J.
Gavran I.
Neider D.
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 01/01/2022
Field of study

MPG.PuRe

Minors and categorical resolutions

Author: Burban I.
Drozd Y.
Gavran V.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

MPG.PuRe

Learning Properties in {LTL} $\cap$ {ACTL} from Positive Examples Only

Author: Ehlers R.
Gavran I.
Neider D.
Publication venue
Publication date: 01/01/2020
Field of study

104112

reposiTUm

MPG.PuRe

Precise but Natural Specification for Robot Tasks

Author: Boldt B.
Darulova E.
Gavran I.
Majumdar R.
Publication venue
Publication date: 01/01/2018
Field of study

We present Flipper, a natural language interface for describing high-level task specifications for robots that are compiled into robot actions. Flipper starts with a formal core language for task planning that allows expressing rich temporal specifications and uses a semantic parser to provide a natural language interface. Flipper provides immediate visual feedback by executing an automatically constructed plan of the task in a graphical user interface. This allows the user to resolve potentially ambiguous interpretations. Flipper extends itself via naturalization: its users can add definitions for utterances, from which Flipper induces new rules and adds them to the core language, gradually growing a more and more natural task specification language. Flipper improves the naturalization by generalizing the definition provided by users. Unlike other task-specification systems, Flipper enables natural language interactions while maintaining the expressive power and formal precision of a programming language. We show through an initial user study that natural language interactions and generalization can considerably ease the description of tasks. Moreover, over time, users employ more and more concepts outside of the initial core language. Such extensions are available to the Flipper community, and users can use concepts that others have defined

MPG.PuRe

Choosing the Initial State for Online Replanning

Author: Fedotov. I.
Fickert M.
Gavran I.
Hoffmann J.
Majumdar R.
Ruml W.
Publication venue
Publication date: 01/01/2021
Field of study

The need to replan arises in many applications. However, in the context of planning as heuristic search, it raises an annoying problem: if the previous plan is still executing, what should the new plan search take as its initial state? If it were possible to accurately predict how long replanning would take, it would be easy to find the appropriate state at which control will transfer from the previous plan to the new one. But as planning problems can vary enormously in their difficulty, this prediction can be difficult. Many current systems merely use a manually chosen constant duration. In this paper, we show how such ad hoc solutions can be avoided by integrating the choice of the appropriate initial state into the search process itself. The search is initialized with multiple candidate initial states and a time-aware evaluation function is used to prefer plans whose total goal achievement time is minimal. Experimental results show that this approach yields better behavior than either guessing a constant or trying to predict replanning time in advance. By making replanning more effective and easier to implement, this work aids in creating planning systems that can better handle the inevitable exigencies of real-world execution

MPG.PuRe

Association for the Advancement of Artificial Intelligence: AAAI Publications

Advice-Guided Reinforcement Learning in a non-Markovian Environment

Author: Gaglione J.
Gavran I.
Neider D.
Topcu U.
Wu B.
Xu Z.
Publication venue
Publication date: 01/01/2021
Field of study

We study a class of reinforcement learning tasks in which the agent receives its reward for complex, temporally-extended behaviors sparsely. For such tasks, the problem is how to augment the state-space so as to make the reward function Markovian in an efficient way. While some existing solutions assume that the reward function is explicitly provided to the learning algorithm (e.g., in the form of a reward machine), the others learn the reward function from the interactions with the environment, assuming no prior knowledge provided by the user. In this paper, we generalize both approaches and enable the user to give advice to the agent, representing the user’s best knowledge about the reward function, potentially fragmented, partial, or even incorrect. We formalize advice as a set of DFAs and present a reinforcement learning algorithm that takes advantage of such advice, with optimal con- vergence guarantee. The experiments show that using well- chosen advice can reduce the number of training steps needed for convergence to optimal policy, and can decrease the computation time to learn the reward function by up to two orders of magnitude

MPG.PuRe

Association for the Advancement of Artificial Intelligence: AAAI Publications

Joint Inference of Reward Machines and Policies for Reinforcement Learning

Author: Ahmad Y.
Gavran I.
Majumdar R.
Neider D.
Topcu U.
Wu B.
Xu Z.
Publication venue
Publication date: 01/01/2020
Field of study

Incorporating high-level knowledge is an effective way to expedite reinforcement learning (RL), especially for complex tasks with sparse rewards. We investigate an RL problem where the high-level knowledge is in the form of reward machines, a type of Mealy machines that encode non-Markovian reward functions. We focus on a setting in which this knowledge is a priori not available to the learning agent. We develop an iterative algorithm that performs joint inference of reward machines and policies for RL (more specifically, q-learning). In each iteration, the algorithm maintains a hypothesis reward machine and a sample of RL episodes. It uses a separate q-function defined for each state of the current hypothesis reward machine to determine the policy and performs RL to update the q-functions. While performing RL, the algorithm updates the sample by adding RL episodes along which the obtained rewards are inconsistent with the rewards based on the current hypothesis reward machine. In the next iteration, the algorithm infers a new hypothesis reward machine from the updated sample. Based on an equivalence relation between states of reward machines, we transfer the q-functions between the hypothesis reward machines in consecutive iterations. We prove that the proposed algorithm converges almost surely to an optimal policy in the limit. The experiments show that learning high-level knowledge in the form of reward machines leads to fast convergence to optimal policies in RL, while the baseline RL methods fail to converge to optimal policies after a substantial number of training steps

MPG.PuRe

Association for the Advancement of Artificial Intelligence: AAAI Publications